Active Information Acquisition for Improved Clustering
نویسندگان
چکیده
Many datasets include feature values that are missing but may be acquired at a cost. In this paper, we consider the clustering task for such datasets, and address the problem of acquiring missing feature values that improve clustering quality in a cost-effective manner. Since acquiring all missing information may be unnecessarily expensive, we propose a framework for iteratively selecting feature values that result in highest improvements in clustering quality per unit cost. Our framework can be adapted to different clustering algorithms, and we illustrate it in the context of two popular methods, K-Means and hierarchical agglomerative clustering. Experimental results on several datasets demonstrate clustering accuracy improvements provided by the proposed framework over random acquisition. Additional experiments demonstrate the performance of the framework for different cost structures, and explore several alternative formulations of the acquisition strategy.
منابع مشابه
Improved Automatic Clustering Using a Multi-Objective Evolutionary Algorithm With New Validity measure and application to Credit Scoring
In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank...
متن کاملAn improved methodology on information distillation by mining program source code
This paper presents a methodology for knowledge acquisition from source code. We use data mining to support semiautomated software maintenance and comprehension and provide practical insights into systems specifics, assuming one has limited prior familiarity with these systems. We propose a methodology and an associated model for extracting information from object oriented code by applying clus...
متن کاملAn Improved K-Means with Artificial Bee Colony Algorithm for Clustering Crimes
Crime detection is one of the major issues in the field of criminology. In fact, criminology includes knowing the details of a crime and its intangible relations with the offender. In spite of the enormous amount of data on offenses and offenders, and the complex and intangible semantic relationships between this information, criminology has become one of the most important areas in the field o...
متن کاملThe Application of Combined Fuzzy Clustering Model and Neural Networks to Measure Valuably of Bank Customers
Currently, acquisition of resources in banks is subject to attraction of the resources of banking customers. Meanwhile, the Bank’s valuable customers are one of the best resources to make profit for banks. Several different models are introduced for evaluation of profitability of the customers; but most of them are classical models and they are unable to evaluate the customers in complete and o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007